FEAT: Add ArabiziConverter for Arabic transliteration by Raulster24 · Pull Request #1906 · microsoft/PyRIT

Raulster24 · 2026-06-03T18:01:46Z

Description

Adds ArabiziConverter, a deterministic PromptConverter that transliterates Arabic script into Arabizi (Latin-script "chat Arabic"), where letters with no Latin equivalent are written with shape-resembling digits (HAH -> 7, AIN -> 3, QAF -> 8). It applies a per-character mapping with Gulf-leaning conventions; no language model is involved, so the same input always produces the same output. Short-vowel diacritics and the tatweel connector are dropped, and non-Arabic text (Latin, digits, punctuation) is left unchanged. The mapping is intentionally lossy, mirroring how Arabizi is actually written.

The mapping follows the documented Arabic chat alphabet (Gulf-leaning where regional variants exist, e.g. QAF -> 8 with GHAIN -> gh to avoid the regional 8 collision). Feedback on specific letter choices is welcome.

Fourth in the set of atomic Arabic-script converters, following BidiConverter (#1832), TatweelConverter (#1869), and ArabicPresentationFormConverter (#1888). It can later migrate to a shared CharacterSubstitutionConverter base alongside UnicodeConfusableConverter.

cc @romanlutz

Tests and Documentation

Added tests/unit/prompt_converter/test_arabizi_converter.py: word transliteration, number-letters, multi-character mappings, dropped diacritics/tatweel, non-Arabic passthrough, mixed text, empty input, determinism, and unsupported-input-type rejection. All pass: uv run pytest tests/unit/prompt_converter/test_arabizi_converter.py
Registered in pyrit/prompt_converter/__init__.py (import + __all__).
Added a usage example to doc/code/converters/1_text_to_text_converters.py and regenerated the paired .ipynb plus the converter modality table in 0_converters.ipynb via JupyText.
ruff and ty are clean; the converter-documentation conformance test passes.

…izi-converter

romanlutz · 2026-06-04T05:53:28Z

FYI @Raulster24 technically we already have a character-level converter! Check LeetspeakConverter or EmojiConverter which has a map from letters to numbers/emojis. This seems to be the same pattern for Arabizi. I don't see much potential for generalizing with a common base class since that really comes down to a single line but it's a good thought.

Raulster24 · 2026-06-04T13:00:15Z

FYI @Raulster24 technically we already have a character-level converter! Check LeetspeakConverter or EmojiConverter which has a map from letters to numbers/emojis. This seems to be the same pattern for Arabizi. I don't see much potential for generalizing with a common base class since that really comes down to a single line but it's a good thought.

@romanlutz Makes sense, you are right. WordLevelConverter already covers this pattern (Leetspeak, Emoji), so a separate base class would duplicate it. I'll drop the CharacterSubstitutionConverter and keep the Arabic converters standalone as they are. Thanks for catching it.

romanlutz

I reran the notebook to produce outputs. Looks great!

Raulster24 added 2 commits June 3, 2026 21:57

FEAT: Add ArabiziConverter for Arabic transliteration

457c191

Merge remote-tracking branch 'upstream/main' into raulster24/add-arab…

e61eab0

…izi-converter

jbolor21 approved these changes Jun 3, 2026

View reviewed changes

romanlutz approved these changes Jun 4, 2026

View reviewed changes

romanlutz added this pull request to the merge queue Jun 4, 2026

Merged via the queue into microsoft:main with commit 155d9af Jun 4, 2026
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add ArabiziConverter for Arabic transliteration#1906

FEAT: Add ArabiziConverter for Arabic transliteration#1906
romanlutz merged 2 commits into
microsoft:mainfrom
Raulster24:raulster24/add-arabizi-converter

Raulster24 commented Jun 3, 2026

Uh oh!

romanlutz commented Jun 4, 2026

Uh oh!

Raulster24 commented Jun 4, 2026

Uh oh!

romanlutz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Raulster24 commented Jun 3, 2026

Description

Tests and Documentation

Uh oh!

romanlutz commented Jun 4, 2026

Uh oh!

Raulster24 commented Jun 4, 2026

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants